Goto

Collaborating Authors

 real-world performance


Real-to-Sim Robot Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions

Zhang, Kaifeng, Sha, Shuo, Jiang, Hanxiao, Loper, Matthew, Song, Hyunjong, Cai, Guangyan, Xu, Zhuo, Hu, Xiaochen, Zheng, Changxi, Li, Yunzhu

arXiv.org Artificial Intelligence

Robotic manipulation policies are advancing rapidly, but their direct evaluation in the real world remains costly, time-consuming, and difficult to reproduce, particularly for tasks involving deformable objects. Simulation provides a scalable and systematic alternative, yet existing simulators often fail to capture the coupled visual and physical complexity of soft-body interactions. We present a real-to-sim policy evaluation framework that constructs soft-body digital twins from real-world videos and renders robots, objects, and environments with photorealistic fidelity using 3D Gaussian Splatting. We validate our approach on representative deformable manipulation tasks, including plush toy packing, rope routing, and T-block pushing, demonstrating that simulated rollouts correlate strongly with real-world execution performance and reveal key behavioral patterns of learned policies. Our results suggest that combining physics-informed reconstruction with high-quality rendering enables reproducible, scalable, and accurate evaluation of robotic manipulation policies. Website: https://real2sim-eval.github.io/


Closing the Sim2Real Performance Gap in RL

Anand, Akhil S, Sawant, Shambhuraj, Hoffmann, Jasper, Reinhardt, Dirk, Gros, Sebastien

arXiv.org Artificial Intelligence

Sim2Real aims at training policies in high-fidelity simulation environments and effectively transferring them to the real world. Despite the developments of accurate simulators and Sim2Real RL approaches, the policies trained purely in simulation often suffer significant performance drops when deployed in real environments. This drop is referred to as the Sim2Real performance gap. Current Sim2Real RL methods optimize the simulator accuracy and variability as proxies for real-world performance. However, these metrics do not necessarily correlate with the real-world performance of the policy as established theoretically and empirically in the literature. We propose a novel framework to address this issue by directly adapting the simulator parameters based on real-world performance. We frame this problem as a bi-level RL framework: the inner-level RL trains a policy purely in simulation, and the outer-level RL adapts the simulation model and in-sim reward parameters to maximize real-world performance of the in-sim policy. We derive and validate in simple examples the mathematical tools needed to develop bi-level RL algorithms that close the Sim2Real performance gap.


Post-Convergence Sim-to-Real Policy Transfer: A Principled Alternative to Cherry-Picking

Khor, Dylan, Weng, Bowen

arXiv.org Artificial Intelligence

Post-Convergence Sim-to-Real Policy Transfer: A Principled Alternative to Cherry-Picking Dylan Khor 1 and Bowen Weng 1 Abstract -- Learning-based approaches, particularly reinforcement learning (RL), have become widely used for developing control policies for autonomous agents, such as locomotion policies for legged robots. Starting from a randomly initialized policy, the empirical expected reward follows a trajectory with an overall increasing trend. While some policies become temporarily stuck in local optima, a well-defined training process generally converges to a reward level with noisy oscillations. However, selecting a policy for real-world deployment is rarely an analytical decision (i.e., simply choosing the one with the highest reward) and is instead often performed through trial and error . T o improve sim-to-real transfer, most research focuses on the pre-convergence stage, employing techniques such as domain randomization, multi-fidelity training, adversarial training, and architectural innovations. However, these methods do not eliminate the inevitable convergence trajectory and noisy oscillations of rewards, leading to heuristic policy selection or cherry-picking. This paper addresses the post-convergence sim-to-real transfer problem by introducing a worst-case performance transference optimization approach, formulated as a convex quadratic-constrained linear programming problem. Extensive experiments demonstrate its effectiveness in transferring RL-based locomotion policies from simulation to real-world laboratory tests. I. INTRODUCTION Figure 1 (b) illustrates the average reward trajectory from training a locomotion policy for the Unitree G1 humanoid robot in Isaac Gym using reinforcement learning (RL) [1] with the random seed being 50. Initially, the randomly initialized policy yields a low training reward.


AI Hype: What Does Google's `Underspecification` Bombshell Mean For Machine Learning Credibility?

#artificialintelligence

Last week, Google released "Underspecification Presents Challenges for Credibility in Modern Machine Learning", a paper that has been sending shockwaves through the Machine Learning community. The paper highlights a particularly thorny problem: even if machine learning models pass tests equally well, they don't perform equally well in the real world. The bugbears of models failing to meet testing performance in the real world have long been known, but this work is the first to publicly prove and name underspecification as a cause. However, before we talk about handling underspecification, we need to describe how machine learning models are put together, and what the problem is. This process has a core tenet that good performance on the testing sample means good performance on real-world data, barring systematic changes between testing and the real-world (called data shift or bias); for instance a model forecasting clothing sales after three months of winter learning is likely to struggle come summertime, having learned a lot about coats but very little about shorts.